7 research outputs found

    Scaling and Resilience in Numerical Algorithms for Exascale Computing

    Get PDF
    The first Petascale supercomputer, the IBM Roadrunner, went online in 2008. Ten years later, the community is now looking ahead to a new generation of Exascale machines. During the decade that has passed, several hundred Petascale capable machines have been installed worldwide, yet despite the abundance of machines, applications that scale to their full size remain rare. Large clusters now routinely have 50.000+ cores, some have several million. This extreme level of parallelism, that has allowed a theoretical compute capacity in excess of a million billion operations per second, turns out to be difficult to use in many applications of practical interest. Processors often end up spending more time waiting for synchronization, communication, and other coordinating operations to complete, rather than actually computing. Component reliability is another challenge facing HPC developers. If even a single processor fail, among many thousands, the user is forced to restart traditional applications, wasting valuable compute time. These issues collectively manifest themselves as low parallel efficiency, resulting in waste of energy and computational resources. Future performance improvements are expected to continue to come in large part due to increased parallelism. One may therefore speculate that the difficulties currently faced, when scaling applications to Petascale machines, will progressively worsen, making it difficult for scientists to harness the full potential of Exascale computing. The thesis comprises two parts. Each part consists of several chapters discussing modifications of numerical algorithms to make them better suited for future Exascale machines. In the first part, the use of Parareal for Parallel-in-Time integration techniques for scalable numerical solution of partial differential equations is considered. We propose a new adaptive scheduler that optimize the parallel efficiency by minimizing the time-subdomain length without making communication of time-subdomains too costly. In conjunction with an appropriate preconditioner, we demonstrate that it is possible to obtain time-parallel speedup on the nonlinear shallow water equation, beyond what is possible using conventional spatial domain-decomposition techniques alone. The part is concluded with the proposal of a new method for constructing Parallel-in-Time integration schemes better suited for convection dominated problems. In the second part, new ways of mitigating the impact of hardware failures are developed and presented. The topic is introduced with the creation of a new fault-tolerant variant of Parareal. In the chapter that follows, a C++ Library for multi-level checkpointing is presented. The library uses lightweight in-memory checkpoints, protected trough the use of erasure codes, to mitigate the impact of failures by decreasing the overhead of checkpointing and minimizing the compute work lost. Erasure codes have the unfortunate property that if more data blocks are lost than parity codes created, the data is effectively considered unrecoverable. The final chapter contains a preliminary study on partial information recovery for incomplete checksums. Under the assumption that some meta knowledge exists on the structure of the data encoded, we show that the data lost may be recovered, at least partially. This result is of interest not only in HPC but also in data centers where erasure codes are widely used to protect data efficiently

    Communication-aware adaptive parareal with application to a nonlinear hyperbolic system of partial dierential equations

    Get PDF
    In the strong scaling limit, the performance of conventional spatial domain decomposition techniques for the parallel solution of PDEs saturates. When sub-domains become small, halo-communication and other overheard come to dominate. A potential path beyond this scaling limit is to introduce domain-decomposition in time, with one such popular approach being the Parareal algorithm which has received a lot of attention due to its generality and potential scalability. Low efficiency, particularly on convection dominated problems, has however limited the adoption of the method. In this paper we introduce a new strategy, Communication Aware Adaptive Parareal (CAAP) to overcome some of the challenges. With CAAP, we choose time-subdomains short enough that convergence of the Parareal algorithm is quick, yet long enough that the overheard of communicating time-subdomain interfaces does not induce a new limit to parallel speed-up. Furthermore, we propose an adaptive work scheduling algorithm that overlaps consecutive Parareal cycles and decouples the number of time-subdomains and number of active node-groups in an efficient manner to allow for comparatively high parallel eciency. We demonstrate the viability of CAAP trough the parallel-in-time integration of a hyperbolic system of PDEs in the form of the two-dimensional nonlinear shallow-water wave equation solved using a 3rd order accurate WENO-RK discretization. For the computational cheap approximate operator needed as a preconditioner in the Parareal corrections we use a lower order Roe type discretization. Time-parallel integration of purely hyperbolic type evolution problems is traditionally considered impractical. Trough large-scale numerical experiments we demonstrate that with CAAP, it is possible not only to obtain time-parallel speedup on this class of evolution problems, but also that we may obtain parallel acceleration beyond what is possible using conventional spatial domain-decomposition techniques alone. The approach is widely applicable for parallel-in-time integration over long time domains, regardless of the class of evolution problem

    An Adjoint Approach for Stabilizing the Parareal Method

    Get PDF
    The parareal algorithm seeks to extract parallelism in the time-integration direction of time-dependent differential equations. While it has been applied with success to a wide range of problems, it suffers from some stability issues when applied to non-dissipative problems. We express the method through an iteration matrix and show that the problematic behavior is related to the non-normal structure of the iteration matrix. To enforce monotone convergence we propose an adjoint parareal algorithm, accelerated by the Conjugate Gradient Method. Numerical experiments confirm the stability and suggest directions for further improving the performance

    Intracellular sorting and transport of proteins

    No full text
    corecore